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Abstract 

04 ■ Consider the standard linear regression model Y = DO + e with given design matrix 

D (n X p), unknown parameter 6 (p x 1) and unobserved error vector e (n x 1) with i.i.d. 
centered Gaussian components. Motivated by an application in economics, we compare 
the ranks Ri of the errors with the ranks Ri of the residuals where e = Y D9 
with the least squares estimator 6. Exact and approximate formulae are given for the rank 



o 

(N 

[Jh ■ distortions y IE {R, - R^f. 
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1 Introduction 



This paper is motivated by a recent consulting case in which a company wants to evaluate 
the performance of its n offices at different locations. The performance of office no. i is 
■ quantified by a certain measure Yi of costs per unit, but it is clear that it is influenced by 



various covariables Xj(l),Xj(2), . . . describing, for instance, regional factors which 



0^ \ cannot be altered by the offices. The idea is to eliminate these effects via a (linear) 

\ regression model, assuming that 

o 

<N. Y, = f{Xi) + ei 



for some unknown handycap function / of the covariable vectors Xi = (-'^t(j))j=i, and 
are the corrected costs per unit of the i-th office. 

Now the proposal is to use a linear model for the regression function / and to estimate 
it via least squares or least absolute deviations. That means, we determine a regression 
function / within a given model J- such that 

n n 

^ef or ^|ei| 

i=l i=l 

becomes minimal, where 

Ei := Yi-f{Xi) 
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is the residual of the fitted regression function /. Based on these residuals one computes 
the ranks 



This procedure is simpler than an established method of benchmarking, called data 
envelopement analysis (DEA), initiated by Farrel (1957) and Charnes et al. (1978). Very 
roughly saying, in that approach one assumes that the €i are non-negative, and /(•) de- 
scribes the minimally achievable costs per unit. The corrected cost measures ej are esti- 
mated via a linear optimization method. The main reasons for the company to use the 
regression approach rather than DEA were the higher complexity of DEA, which made it 
difficult to communicate it to employees, and the known sensitivity of DEA to errors in the 
data. Moreover, normal quantile-quantile plots of the residuals showed no serious violation 
of a Gaussian distribution, whereas the DEA paradigm would predict a non-symmetric, 
right-skewed distribution. 

This is certainly a non-standard application of regression methods in the sense that the 
regression function is treated as a nuisance parameter while the "errors" are of primary 
interest. The problem with that approach is that these "errors" may fail to satisfy common 
assumptions such as independence, mean or median zero and homoscedasticity. Indeed, 
it may happen that the numbers e^, as a measure of the offices' individual performance 
(motivation, efficiency etc.), are correlated with the covariable vectors Xi. But then the 
residuals are systematically different from the numbers Cj. By the way, DEA may suffer 
from the same problem, in particular, when the performance of most offices is still far 
from optimal. 

Even if the assumed model J- is correct and if the "errors" satisfy the standard 
assumptions of being independent and following all the same distribution J\f{0, cj^) for some 
fj > 0, the average absolute difference between the ranks Ri and Ri may be substantial. 
In the present paper we derive an explicit expression for the "rank distortions" 



i.e. upper bounds for IE — Ri\, in case of traditional least-squares regression. Section [2] 
contains the main results, an exact formula and approximations. Section [3] provides a 
heuristical derivation of an approximation of the rank distortions which also indicates 
what may happen in non-Gaussian settings. Presumably these arguments could be made 
rigorous by applying similar techniques and arguments as Koul (1969, 1992), Loynes (1980) 
and Mammen (1996). Section 0] contains rigorous proofs which do rely on the errors 



Ri := rank of Cj among ?i,?2i • • • , 



as a surrogate for 



Ri := rank of e.j among ei, €2, ■ ■ ■ ,e. 
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being independent with the same Gaussian distribution. The advantage of that is that 
minimal assumptions are imposed on the underlying design matrix. 



2 Results 



We consider a linear regression model with a random vector 

Y = DO + e. 

Here D G M"^^ is a given design matrix with rank(Z)) = p < n, is an unknown parameter 
vector in and e is an unobserved random vector with distribution A/",! (0,(7^ J) with 
unknown cr > 0. In our specific setting, 



D 



h{X2) h{X2) ... /p(X2) 



with the observed covariable vectors Xi, X2, . . . , Xn and a given basis /i, /2, • • • , /p of 
the model F. 

Let us recall some well-known facts from linear models (cf. Ryan 1997). The least- 
squares estimator of is given by 9 := {D^ D)^^D'^Y , and the fitted vector Y := DO 
may be written as 1^ = HY with the "hat matrix" 

H := D{D^D)-^D'^ G R"''". 

This matrix describes the orthogonal projection onto the column space of D and satisfies 
H = = H^. Moreover, since < v^Hv < 1 for any unit vector v G M", one can 
easily verify that all "leverages" Ha satisfy < Ha < 1 and trace(ii") = X]i'=i Ha = p. 
The residual vector e ■.= Y — Y may be written as 

e = {I-H)€. 

Under a mild regularity condition on the hat matrix H, the residuals are pairwise 
different: 

Lemma 1. For arbitrary indices 1 < i < j < n, 
The condition Ha = Hjj = Hij + 1 implies that Ha > 1/2. 



1 */ Ha = Hjj = Hij + 1, 
else. 
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This lemma remains valid if the errors ei, e2, ■ ■ ■ , are only assumed to be independent 
with continuous distributions. 

An immediate consequence of Lemma [1] is that the residuals are pairwise different 
almost surely, whenever Ha > 1/2 for at most one index i. Now we are ready to state our 
first main result about the ranks Ri and Ri: 

Theorem 1. Suppose that Ha > 1/2 for at most one index i G {1,2, .. . ,n}. Then for 
arbitrary indices i,j G {1, 2, . . . , n}, 

IE {Ri — Ri){Rj — Rj) 



Ott ^ ( 



arcsm + arcsm 



2vr,^^V V 2 J y^{2-Huk,i){2-Hu,,) 

. ( ^ki,ij — Hke,ij \ . ( ^ki,ij — Hke,i' 
— arcsm — , — arcsm — 

\^2{2- Hkk,i)^ V v/2(2 - Hu,,] 

where Aki^ij := 5m + Sij — 6kj — 6a, Hke^ij := Hm + Hij — Hkj — Hn and Hki^i := Hm^u- 
In particular, 



W.{Ri-Rif = — ^U-2arccos(y^fcI~/2n 

^ k=l ^ ^ 



+ - - + arcsm( 



1 — ^M.i \ ■ ( ^~ ^ke,i 



arcsm — , — arcsm 



\/2(2 — Hkk,i)' ^ \/2(2 — Hu,^ 

Here and throughout 5st denotes Kronecker's symbol, i.e. 5st equals one if s = t and 
zero otherwise. 

Theorem [1] is useful for exact numerical calculations. It was used in the aforementioned 
consulting case to show that rank distortions may be substantial. Numerical experiments 
revealed also that the rank distortions are closely related to the leverages Ha. Recall that 

Var(?j) = Cov(ej,ej) = a'^il-Ha) and Var(ej-ei) = Cov(ej - ?j, e^) = a'^Ha. 

Here is a theoretical result about the rank distortions in case of small maximal leverage: 

Theorem 2. Suppose that the column space of D contains the constant vectors, i.e. 
HI = 1 := (l)f=i. Then, as rj := maxi=i^...^„ //jj — )• 0, 

W.{R,-Ri){R,-R,) = -f§l=^ + 0{nr,'/' + nW) 

2y^4 - dij vr 

uniformly in i,j G {1, 2, . . . , n}. 
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A numerical example. Suppose that n = 70, and let D be equal to 



1 Xi 
1 X2 



r>W := : or £>(2) ._ 

.1 Xr, 

the design matrix for simple linear or quadratic regression, where Xi < X2 < ■ ■ ■ < X^ 



1 Xi xl 

1 X2 Xn 



1 Xn X^ 



are equispaced numbers. Figure [T] shows the pairs ( i, y IE — I for both cases 



In addition the approximations n^/ {Hi,, 
lines. 



n 



)/{2ttV3) and nJ Hii/{2irV3) are shown as 





Figure 1: Root mean squared rank distortions for simple linear regression (left) and 
quadratic regression (right) with n = 70 equispaced X-values. 



3 Heuristics 

Asymptotic statements in this section are meant as rj = maxj=i^..._„ Ha — )• 0. We as- 
sume that the errors ei, 62, . . . , e„ are independent and identically distributed with finite 
standard deviation a and c.d.f. F with bounded and uniformly continuous density /. 

One can easily deduce from = H = that 

JEiHe),{He), = a^Hij, (1) 



5 



whereas the Cauchy-Schwarz inequahty imphes that 



33- 

Hence 

\Hij\ < 7] for 1 < i,j < n. (2) 

Pretending that the empirical c.d.f. F of the errors and the empirical c.d.f. F of the 
residuals ?j are sufficiently close to F, we write 

Ri = nF{ei) « nF(ei), 

Ri = nF{ei) « nF{ei) = nF{ei - {He)i). 

But {He)i is quite small, precisely, 

E(l/e)2 = a^H,, < a^r] 

by ([T]). Hence we write 

R,-Ri « -nf{e,)iHe)i. 
Moreover, for i,j G {1,2, . . . ,n} and i E 

n 

{He)e = ''^Hikek ~ ^ Hgk^k, 
k=i k^{i,j} 

because Ylke{i j} ^tk^k is very small in the sense that 

Ie( ^^i^^kY = E < 2(7%2 

kG{i,j} k£{i,j} 

by ([2]). Thus we pretend that the random pairs and [{He)i, {He)j) are stochasti- 

cally independent and conjecture that 

E {R,. - Ri){Rj - Rj) ~ JEf{ei)f(ej){He)i{He)j 

« n^JEf{ei)f{ej)JE{HeUHe)j 

= n^a^lEf{ei)f{ej)H,j. (3) 

Now consider the special case of F = ^{a~^-) and / = (T~^</)((t^^-) with the standard 
Gaussian c.d.f. ^> and density (p. For i / j, 



and 



a2lE/(ei)2 = a-^ J (f){a-^xfdx 
= C^^r^ j (l>{V3x)dx 



Hence the conjectured approximation ([3]) equals 



4 Proofs 

Proof of Lemma [1], We write e = Ge with the companion hat matrix G := I — H 
describing the orthogonal projection on the orthogonal complement of the column space 
of D. Since 



k=l 

we may conclude that — 'ej has a continuous distribution unless 

Gik = Gjk for A; = 1, 2, . . . , n. (4) 
In the latter case, = tj almost surely. But condition ^ is equivalent to 

n 

= ^ ']{Gik — Gjk)'^ = Gii + Gjj — 2Gij, (5) 



fe=i 

-<T _ _ /-(2 



where we utihzed G ' = G = G^. Note further that (g]) entails that 

— Gji — Gij — Gjj • 

Hence ([H) implies that 

Gii = Gjj = Gjj. (6) 

Obviously the latter condition yields Consequently, the three conditions dH), dS]) and 
dSD are equivalent. Since G = I — H, one may reformulate ^ as 

Ha = Hjj = Hij + 1. 



Finally, note that Hirn = '^{He)i{He)m. In particular, \Hij\ < y^HnHjj. Hence 
it follows from Ha = Hjj = Hij + 1 that Ha > 1 — Ha, i.e. Ha > 1/2. □ 
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A key ingredient for the proof of Theorem [T] is an elementary equahty for bivariate 
Gaussian distributions: 



Lemma 2. Let Y be a random vector with distribution A/'2(0,5]), where Sii,S22 > 0. 
Then 

N 7r/2 + arcsin(p) , S12 
lP{Yi < and Y2 < 0) = — — with p ■- 



2-K A/S11S22 



Proof of Lemma [21. Since the probabiHty in question does not change when we replace 

— 1/2 

y^with T^^^'^Yi, we may assume without loss of generality that Sn = S22 = 1 and II12 = p- 
If Z denotes a random vector with standard Gaussian distribution on M^, then Y has the 



same distribution as + pZ2\^ , where p := \J\ — p^. Now we write p = sin(a) 

and p = cos(q) with a := arcsin(p) G [— 7r/2, 7r/2], and Z = [i? cos(©), iisin(0)]~'', where 
R := \\Z\\ > almost surely, and Q is uniformly distributed on [0,27r]. Then 

P(Yi < and ^2 < 0) = P(cos(e) < and sin(a) cos(e) + cos(a) sin(e) < O) 

= P(cos(e) < and sin(a + B) < O) 

= P(e G [7r/2,37r/2] and a + 9 G [vr, 27r] + 2ttZ) 

= P(e G [7r-a,37r/2]) 
7r/2 + a 



27r 



□ 



Proof of Theorem [Tl According to Lemma [U 

R, = l + Y,H^k<e^} = l + J2Hale<0} and 

ky^i k^i 

R^ = 1 + J^l{efc<?a = l + Y,l{^le<0} 

k^i kj^i 

almost surely, where a^j := — ej and a^j := G(efc — ej) with the standard basis 
ei, 62, . . . , e„ of M". Consequently it follows from Lemma[2]that 

E {Ri — Ri){Rj — Rj) 

= ^ IE {l{ale < 0} - l{ale < 0}) (ijaje < 0} - Ijaje < 0}) 

k^i,e^j 

= Yl (lPKe<0,aT.e<0)+P(aJ,e<0,aJe<0) 

kj^i,ej^j 

- W{ale < 0, aje < O) - P(aJ,e < 0, aj^e < O) 
= Yj (arcsin(cos(afcj,a£j)) + arcsin(cos(afcj,%)) 



arcsin(cos(afcj, a^j)) — arcsin(cos(afci, a^j)) j , 



where 



V w 

cos(v, w) := ----r. — - for v, G \ {0}. 

\\v\\ \\w\\ 



Note that 

^ki^ij = i^k - ei)^ (et - Sj) = Ske + Sij - Skj - Sii =: Ake,ij, 

and with 

Hke,ij '■= (ejfc — H{e(, - Sj) = Hke + Hij — H^j — Hi£, 

we may write 

= Gki + Gij — Gkj — Gi£ 

= Akl,ij — H^i^ij, 

= Ake,ij — Hki^ij. 
Hence we obtain the formula 



IE — Ri)(^Rj — Rj 
1 



/ . ( Akt,ij\ , . / Ake,ij - Hke.ij \ 
arcsin — + arcsin — 

. / Ake,ij — Hke,ij \ . ( ^kiM — Hke^ij 

— arcsin — , — arcsm 



27r ^ 

k^iMJ 

— arcsin 

•\/2(2 - Hiij)^ Vy^2(2 - Hj:k,i) 

But the restriction to indices k ^ i and £ ^ j is superfluous, because A^e^ij = H^e^ij = 
whenever k = i or £ = j. This yields the first asserted formula. 

In the special case of z = j, note that Ake^u = 1 + 6ke H i ^ {k, £}. If we replace A^i^a 
with 1 + in our formula for IE — Ri)'^, we end up with the expression 

— > arcsm + arcsm — , 

2vr,^^V ^ 2 J \^{2-Hkk,i)i2-Hu,i)J 



arcsm 



/ 1 + — Hke^i \ . / 'i- + Ski — Hke^i \ \ 

' =^ — arcsm — — - 

i)J yj2(2-Hkk,i)n 



V2(2 - Hu,i) ^ y ^2(2 -Hkk, 



But for A; = z or ^ = i the corresponding summands are equal to zero, because k = i 
implies that Hki^i = Hkk,i = 0, and £ = i implies that Hk^^i = Hu,i = 0. Distinguishing 
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the cases k = i and k ^ £ yields 



JE{R,-R,f = — ^ ( TT - 2 arcsin 1 - Hkk,i/2 
^ k=i ^ 



1 / TT . / 1 - Hki i 



vr 

l<A:<^<n 



6 ^ \/(2 — Hkk,i){'2 — Hu,' 

- arcsm — — arcsm — , 

V ^2(2 V^2(2-/7fcfc,i)^ 



Finally the assertion follows from the well-known fact that arcsin(\/l ~ ^) = arccos(-v/t) 
for < i < 1. □ 



Proof of Theorem [21 First recall that \ilki\ < \fHkkHu < whence \Hkt^ij\ < 4r/. 
Furthermore, A.ke,ij = Sij whenever {k,i} n {i,j} = and k ^ £, i.e. A^^ jj 7^ for at 
most 71 + 2 index pairs {k,i). Elementary calculus shows that 



|arcsin(x) — arcsin(y)| < C y/\x — y\ 

for some constant C, the optimal one being it/^/2. Hence 

/ d — Hkt^ij \ 

. ( d - Hkf ij \ / d — Hki ij 

arcsm — , — arcsm — — — 

yy/2{2 -Hkk,i)'' V ^2(2 - Huj) 

uniformly in k, i, i,j G {1, 2, . . . , n} and d = Aki^ij, 6ij. Consequently, 
IE {Ri — Ri){Rj — Rj) 

Sii \ , . f Sij — Hk 



d\ ( d-H, 

arcsm — + arcsm 



El arcsinf-^^ + 
V V 2 / 



ij ^ki,ij 

arcsm I 



2^,i^A V2/ \^{2-HkkM'^-Heij 



■ ( ^ij ~ Hk£,ij \ . ( Sij — Hke^ij \\ , 1 

arcsm — — — — arcsm — ^ = + 0(ri 

yy/2(2^nT^) \^2{2-Hu,j)n 
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uniformly in i,j G {1, 2, . . . , n}. But for d € [0, 1] and x,y,z & [~4?7, 4r]], 

( d — x \ . / d — x \ . / d — x \ 

I — , — arcsin — , — arcsm — , 

. /d — x\ ./ d — x \ 

+ arcsin — arcsin — , 

V 2 ; V^2(2^>' 

/ d—x \ /d—x 
+ arcsm — — arcsin 



'd\ . fd — x 

= arcsin ( - — arcsin — - — 



\^{2-y){2-z)^ ^V2(2^ 

'd\ fd — x 

arcsin I arcsin ^ — ^ — 

d — x\ ■ [ ~ ^ ( t , y , 2- 



Consequently, 



+ arcsin^ — ^ — j — arcsin^ — ^ — \1 + ^ + 0{ri 

. / d — x y ^, 2x\\ ■ f d — x 

+ arcsin — , 1 H V Ovn ) — arcsin — , 

Vy2(2^V 4 ^'V; Vy2(2^ 

arcsin' (^^ + 0(r/)) I 

+ arcsin' + 0(r,)) + 0(r;^)) + arcsin' + 0(ry)) + 0(r;^)) 



^ ^ 1 " 
IE - - Rj) = V + 0{nri^/^ + nW) ■ 



But it follows from Hl = H l = l that 



n. 



k,e=i k,e=i □ 
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